We present a multilingual Named Entity Recognition approach based on a robustand general set of features across languages and datasets. Our system combinesshallow local information with clustering semi-supervised features induced onlarge amounts of unlabeled text. Understanding via empirical experimentationhow to effectively combine various types of clustering features allows us toseamlessly export our system to other datasets and languages. The result is asimple but highly competitive system which obtains state of the art resultsacross five languages and twelve datasets. The results are reported on standardshared task evaluation data such as CoNLL for English, Spanish and Dutch.Furthermore, and despite the lack of linguistically motivated features, we alsoreport best results for languages such as Basque and German. In addition, wedemonstrate that our method also obtains very competitive results even when theamount of supervised data is cut by half, alleviating the dependency onmanually annotated data. Finally, the results show that our emphasis onclustering features is crucial to develop robust out-of-domain models. Thesystem and models are freely available to facilitate its use and guarantee thereproducibility of results.
展开▼